An improved algorithm for unsupervised decomposition of a multi-author document
نویسنده
چکیده
This paper addresses the problem of unsupervised decomposition of a multi author text document: identifying the sentences that were written by each author assuming the number of authors is unknown. An approach, BayesAD, is developed for solving this problem: apply a Bayesian segmentation algorithm, followed by a segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, a modified version of an approach published by Akiva and Koppel in 2013. BayesAD exhibited greater accuracy than AK in all experiments. However, BayesAD has a parameter that needs to be set and which had a non trivial impact on accuracy. Developing an effective method for eliminating this need would be a fruitful direction for future work. When controlling for topic, the accuracy of BayesAD and AK were, in all but one case, worse than a baseline approach wherein one author was assumed to write all sentences in the input text document. Hence, room for improved solutions exists.
منابع مشابه
Improved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...
متن کاملA Novel Intelligent Energy Management Strategy Based on Combination of Multi Methods for a Hybrid Electric Vehicle
Based on the problems caused by today conventional vehicles, much attention has been put on the fuel cell vehicles researches. However, using a fuel cell system is not adequate alone in transportation applications, because the load power profile includes transient that is not compatible with the fuel cell dynamic. To resolve this problem, hybridization of the fuel cell and energy storage device...
متن کاملUnsupervised Multi-Author Document Decomposition Based on Hidden Markov Model
This paper proposes an unsupervised approach for segmenting a multiauthor document into authorial components. The key novelty is that we utilize the sequential patterns hidden among document elements when determining their authorships. For this purpose, we adopt Hidden Markov Model (HMM) and construct a sequential probabilistic model to capture the dependencies of sequential sentences and their...
متن کاملUnsupervised Decomposition of a Multi-Author Document Based on Naive-Bayesian Model
This paper proposes a new unsupervised method for decomposing a multi-author document into authorial components. We assume that we do not know anything about the document and the authors, except the number of the authors of that document. The key idea is to exploit the difference in the posterior probability of the Naive-Bayesian model to increase the precision of the clustering assignment and ...
متن کاملEvaluating the Effectiveness of Integrated Benders Decomposition Algorithm and Epsilon Constraint Method for Multi-Objective Facility Location Problem under Demand Uncertainty
One of the most challenging issues in multi-objective problems is finding Pareto optimal points. This paper describes an algorithm based on Benders Decomposition Algorithm (BDA) which tries to find Pareto solutions. For this aim, a multi-objective facility location allocation model is proposed. In this case, an integrated BDA and epsilon constraint method are proposed and it is shown that how P...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JASIST
دوره 67 شماره
صفحات -
تاریخ انتشار 2016